-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[OMPIRBuilder] - Fix emitTargetTaskProxyFunc to not generate empty functions #126958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OMPIRBuilder] - Fix emitTargetTaskProxyFunc to not generate empty functions #126958
Conversation
…nctions This is a fix for llvm#126949 There are two issues being fixed here. First, in some cases, OMPIRBuilder generates empty target task proxy functions. This happens when the target kernel doesn't use any stack-allocated data (either no data or only globals). The second problem is encountered when the target task i.e the code that makes the target call spans a single basic block. This usually happens when we do not generate a target or device kernel launch and instead fall back to the host. In such cases, we end up not outlining the target task entirely. This can cause us to call target kernel twice - once via the target task proxy function and a second time via the host fallback This PR fixes both of these problems and updates some tests to catch these problems should this patch fail.
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-flang-openmp Author: Pranav Bhandarkar (bhandarkar-pranav) ChangesThis is a fix for #126949 There are two issues being fixed here. First, in some cases, OMPIRBuilder generates empty target task proxy functions. This happens when the target kernel doesn't use any stack-allocated data (either no data or only globals). The second problem is encountered when the target task i.e the code that makes the target call spans a single basic block. This usually happens when we do not generate a target or device kernel launch and instead fall back to the host. In such cases, we end up not outlining the target task entirely. This can cause us to call target kernel twice - once via the target task proxy function and a second time via the host fallback This PR fixes both of these problems and updates some tests to catch these problems should this patch fail. Full diff: https://github.com/llvm/llvm-project/pull/126958.diff 3 Files Affected:
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 91fc16e54c88f..69c76f7400904 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -7022,9 +7022,10 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
Function *KernelLaunchFunction = StaleCI->getCalledFunction();
// StaleCI is the CallInst which is the call to the outlined
- // target kernel launch function. If there are values that the
- // outlined function uses then these are aggregated into a structure
- // which is passed as the second argument. If not, then there's
+ // target kernel launch function. If there are local live-in values
+ // that the outlined function uses then these are aggregated into a structure
+ // which is passed as the second argument. If there are no local live-in valluess
+ // or if all values used by the outlined kernel are global variables, then there's
// only one argument, the threadID. So, StaleCI can be
//
// %structArg = alloca { ptr, ptr }, align 8
@@ -7063,6 +7064,8 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
// host and device.
assert((!HasShareds || (StaleCI->arg_size() == 2)) &&
"StaleCI with shareds should have exactly two arguments.");
+
+ Value *ThreadId = ProxyFn->getArg(0);
if (HasShareds) {
auto *ArgStructAlloca = dyn_cast<AllocaInst>(StaleCI->getArgOperand(1));
assert(ArgStructAlloca &&
@@ -7073,7 +7076,6 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
AllocaInst *NewArgStructAlloca =
Builder.CreateAlloca(ArgStructType, nullptr, "structArg");
Value *TaskT = ProxyFn->getArg(1);
- Value *ThreadId = ProxyFn->getArg(0);
Value *SharedsSize =
Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType));
@@ -7086,7 +7088,9 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
LoadShared->getPointerAlignment(M.getDataLayout()), SharedsSize);
Builder.CreateCall(KernelLaunchFunction, {ThreadId, NewArgStructAlloca});
- }
+ } else
+ Builder.CreateCall(KernelLaunchFunction, {ThreadId});
+
Builder.CreateRetVoid();
return ProxyFn;
}
@@ -7229,11 +7233,28 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
Builder, AllocaIP, ToBeDeleted, TargetTaskAllocaIP, "global.tid", false));
Builder.restoreIP(TargetTaskBodyIP);
-
if (Error Err = TaskBodyCB(DeviceID, RTLoc, TargetTaskAllocaIP))
return Err;
- OI.ExitBB = Builder.saveIP().getBlock();
+ // The outliner (CodeExtractor) extract a sequence or vector of blocks that
+ // it is given. These blocks are enumerated by
+ // OpenMPIRBuilder::OutlineInfo::collectBlocks which expects the OI.ExitBlock
+ // to be outside the region. In other words, OI.ExitBlock is expected to be
+ // the start of the region after the outlining. We used to set OI.ExitBlock
+ // to the InsertBlock after TaskBodyCB is done. This is fine in most cases
+ // except when the task body is a single basic block. In that case,
+ // OI.ExitBlock is set to the single task body block and will get left out of
+ // the outlining process. So, simply create a new empty block to which we
+ // uncoditionally branch from where TaskBodyCB left off
+ BasicBlock *TargetTaskContBlock =
+ BasicBlock::Create(Builder.getContext(), "target.task.cont");
+
+ auto *CurFn = Builder.GetInsertBlock()->getParent();
+ emitBranch(TargetTaskContBlock);
+ emitBlock(TargetTaskContBlock, CurFn, /*IsFinished=*/true);
+
+ OI.ExitBB = TargetTaskContBlock;
+
OI.PostOutlineCB = [this, ToBeDeleted, Dependencies, HasNoWait,
DeviceID](Function &OutlinedFn) mutable {
assert(OutlinedFn.getNumUses() == 1 &&
diff --git a/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir b/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
index 621a206e18053..ece32bb5419c6 100644
--- a/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
@@ -25,8 +25,29 @@ module attributes {omp.is_target_device = false} {
// CHECK: define void @omp_target_depend_()
// CHECK-NOT: define {{.*}} @
// CHECK-NOT: call i32 @__tgt_target_kernel({{.*}})
+// CHECK: call void @__kmpc_omp_task_begin_if0
+// CHECK-NEXT: call void @.omp_target_task_proxy_func
+// CHECK: call void @__kmpc_omp_task_complete_if0
+// https://github.com/llvm/llvm-project/issues/126949 exposes two issues
+// 1. Empty target task proxy functions
+// 2. When 1 fixed, it leads to a second problem of calling the omp target kernel twice
+// Once via the target task proxy function and a second time after the target task is done.
+// The following checks check problem #2.
+// functions. The following checks tests the fix for this issue.
+// CHECK-NEXT: br label %[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY:.*]]
+// CHECK:[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY]]:
+// CHECK-NEXT: ret void
+
+// CHECK: define internal void @omp_target_depend_..omp_par
// CHECK: call void @__omp_offloading_[[DEV:.*]]_[[FIL:.*]]_omp_target_depend__l[[LINE:.*]](ptr {{.*}})
+// CHECK-NEXT: br label %[[BLOCK_AFTER_TARGET_TASK_BODY:.*]]
+// CHECK: [[BLOCK_AFTER_TARGET_TASK_BODY]]:
// CHECK-NEXT: ret void
+
// CHECK: define internal void @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_depend__l[[LINE]](ptr %[[ADDR_A:.*]])
// CHECK: store i32 100, ptr %[[ADDR_A]], align 4
+
+// The following check test for the fix of problem #1 as described in https://github.com/llvm/llvm-project/issues/126949
+// CHECK: define internal void @.omp_target_task_proxy_func
+// CHECK: call void @omp_target_depend_..omp_par
diff --git a/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir b/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
index 6b634226a3568..94d8d052d087e 100644
--- a/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
@@ -20,10 +20,30 @@ module attributes {omp.is_target_device = false} {
// CHECK: define void @omp_target_nowait_()
// CHECK-NOT: define {{.*}} @
// CHECK-NOT: call ptr @__kmpc_omp_target_task_alloc({{.*}})
+// CHECK: call void @__kmpc_omp_task_begin_if0
+// CHECK-NEXT: call void @.omp_target_task_proxy_func
+// CHECK: call void @__kmpc_omp_task_complete_if0
+// https://github.com/llvm/llvm-project/issues/126949 exposes two issues
+// 1. Empty target task proxy functions
+// 2. When 1 fixed, it leads to a second problem of calling the omp target kernel twice
+// Once via the target task proxy function and a second time after the target task is done.
+// The following checks check problem #2.
+// functions. The following checks tests the fix for this issue.
+// CHECK-NEXT: br label %[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY:.*]]
+// CHECK:[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY]]:
+// CHECK-NEXT: ret void
+
// Verify that we directly emit a call to the "target" region's body from the
// parent function of the the `omp.target` op.
+// CHECK: define internal void @omp_target_nowait_..omp_par
// CHECK: call void @__omp_offloading_[[DEV:.*]]_[[FIL:.*]]_omp_target_nowait__l[[LINE:.*]](ptr {{.*}})
+// CHECK-NEXT: br label %[[BLOCK_AFTER_TARGET_TASK_BODY:.*]]
+// CHECK: [[BLOCK_AFTER_TARGET_TASK_BODY]]:
// CHECK-NEXT: ret void
// CHECK: define internal void @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_nowait__l[[LINE]](ptr %[[ADDR_X:.*]])
// CHECK: store float 5{{.*}}, ptr %[[ADDR_X]], align 4
+
+// The following check test for the fix of problem #1 as described in https://github.com/llvm/llvm-project/issues/126949
+// CHECK: define internal void @.omp_target_task_proxy_func
+// CHECK: call void @omp_target_nowait_..omp_par
|
@llvm/pr-subscribers-mlir-llvm Author: Pranav Bhandarkar (bhandarkar-pranav) ChangesThis is a fix for #126949 There are two issues being fixed here. First, in some cases, OMPIRBuilder generates empty target task proxy functions. This happens when the target kernel doesn't use any stack-allocated data (either no data or only globals). The second problem is encountered when the target task i.e the code that makes the target call spans a single basic block. This usually happens when we do not generate a target or device kernel launch and instead fall back to the host. In such cases, we end up not outlining the target task entirely. This can cause us to call target kernel twice - once via the target task proxy function and a second time via the host fallback This PR fixes both of these problems and updates some tests to catch these problems should this patch fail. Full diff: https://github.com/llvm/llvm-project/pull/126958.diff 3 Files Affected:
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 91fc16e54c88f..69c76f7400904 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -7022,9 +7022,10 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
Function *KernelLaunchFunction = StaleCI->getCalledFunction();
// StaleCI is the CallInst which is the call to the outlined
- // target kernel launch function. If there are values that the
- // outlined function uses then these are aggregated into a structure
- // which is passed as the second argument. If not, then there's
+ // target kernel launch function. If there are local live-in values
+ // that the outlined function uses then these are aggregated into a structure
+ // which is passed as the second argument. If there are no local live-in valluess
+ // or if all values used by the outlined kernel are global variables, then there's
// only one argument, the threadID. So, StaleCI can be
//
// %structArg = alloca { ptr, ptr }, align 8
@@ -7063,6 +7064,8 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
// host and device.
assert((!HasShareds || (StaleCI->arg_size() == 2)) &&
"StaleCI with shareds should have exactly two arguments.");
+
+ Value *ThreadId = ProxyFn->getArg(0);
if (HasShareds) {
auto *ArgStructAlloca = dyn_cast<AllocaInst>(StaleCI->getArgOperand(1));
assert(ArgStructAlloca &&
@@ -7073,7 +7076,6 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
AllocaInst *NewArgStructAlloca =
Builder.CreateAlloca(ArgStructType, nullptr, "structArg");
Value *TaskT = ProxyFn->getArg(1);
- Value *ThreadId = ProxyFn->getArg(0);
Value *SharedsSize =
Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType));
@@ -7086,7 +7088,9 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
LoadShared->getPointerAlignment(M.getDataLayout()), SharedsSize);
Builder.CreateCall(KernelLaunchFunction, {ThreadId, NewArgStructAlloca});
- }
+ } else
+ Builder.CreateCall(KernelLaunchFunction, {ThreadId});
+
Builder.CreateRetVoid();
return ProxyFn;
}
@@ -7229,11 +7233,28 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
Builder, AllocaIP, ToBeDeleted, TargetTaskAllocaIP, "global.tid", false));
Builder.restoreIP(TargetTaskBodyIP);
-
if (Error Err = TaskBodyCB(DeviceID, RTLoc, TargetTaskAllocaIP))
return Err;
- OI.ExitBB = Builder.saveIP().getBlock();
+ // The outliner (CodeExtractor) extract a sequence or vector of blocks that
+ // it is given. These blocks are enumerated by
+ // OpenMPIRBuilder::OutlineInfo::collectBlocks which expects the OI.ExitBlock
+ // to be outside the region. In other words, OI.ExitBlock is expected to be
+ // the start of the region after the outlining. We used to set OI.ExitBlock
+ // to the InsertBlock after TaskBodyCB is done. This is fine in most cases
+ // except when the task body is a single basic block. In that case,
+ // OI.ExitBlock is set to the single task body block and will get left out of
+ // the outlining process. So, simply create a new empty block to which we
+ // uncoditionally branch from where TaskBodyCB left off
+ BasicBlock *TargetTaskContBlock =
+ BasicBlock::Create(Builder.getContext(), "target.task.cont");
+
+ auto *CurFn = Builder.GetInsertBlock()->getParent();
+ emitBranch(TargetTaskContBlock);
+ emitBlock(TargetTaskContBlock, CurFn, /*IsFinished=*/true);
+
+ OI.ExitBB = TargetTaskContBlock;
+
OI.PostOutlineCB = [this, ToBeDeleted, Dependencies, HasNoWait,
DeviceID](Function &OutlinedFn) mutable {
assert(OutlinedFn.getNumUses() == 1 &&
diff --git a/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir b/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
index 621a206e18053..ece32bb5419c6 100644
--- a/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
@@ -25,8 +25,29 @@ module attributes {omp.is_target_device = false} {
// CHECK: define void @omp_target_depend_()
// CHECK-NOT: define {{.*}} @
// CHECK-NOT: call i32 @__tgt_target_kernel({{.*}})
+// CHECK: call void @__kmpc_omp_task_begin_if0
+// CHECK-NEXT: call void @.omp_target_task_proxy_func
+// CHECK: call void @__kmpc_omp_task_complete_if0
+// https://github.com/llvm/llvm-project/issues/126949 exposes two issues
+// 1. Empty target task proxy functions
+// 2. When 1 fixed, it leads to a second problem of calling the omp target kernel twice
+// Once via the target task proxy function and a second time after the target task is done.
+// The following checks check problem #2.
+// functions. The following checks tests the fix for this issue.
+// CHECK-NEXT: br label %[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY:.*]]
+// CHECK:[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY]]:
+// CHECK-NEXT: ret void
+
+// CHECK: define internal void @omp_target_depend_..omp_par
// CHECK: call void @__omp_offloading_[[DEV:.*]]_[[FIL:.*]]_omp_target_depend__l[[LINE:.*]](ptr {{.*}})
+// CHECK-NEXT: br label %[[BLOCK_AFTER_TARGET_TASK_BODY:.*]]
+// CHECK: [[BLOCK_AFTER_TARGET_TASK_BODY]]:
// CHECK-NEXT: ret void
+
// CHECK: define internal void @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_depend__l[[LINE]](ptr %[[ADDR_A:.*]])
// CHECK: store i32 100, ptr %[[ADDR_A]], align 4
+
+// The following check test for the fix of problem #1 as described in https://github.com/llvm/llvm-project/issues/126949
+// CHECK: define internal void @.omp_target_task_proxy_func
+// CHECK: call void @omp_target_depend_..omp_par
diff --git a/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir b/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
index 6b634226a3568..94d8d052d087e 100644
--- a/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
@@ -20,10 +20,30 @@ module attributes {omp.is_target_device = false} {
// CHECK: define void @omp_target_nowait_()
// CHECK-NOT: define {{.*}} @
// CHECK-NOT: call ptr @__kmpc_omp_target_task_alloc({{.*}})
+// CHECK: call void @__kmpc_omp_task_begin_if0
+// CHECK-NEXT: call void @.omp_target_task_proxy_func
+// CHECK: call void @__kmpc_omp_task_complete_if0
+// https://github.com/llvm/llvm-project/issues/126949 exposes two issues
+// 1. Empty target task proxy functions
+// 2. When 1 fixed, it leads to a second problem of calling the omp target kernel twice
+// Once via the target task proxy function and a second time after the target task is done.
+// The following checks check problem #2.
+// functions. The following checks tests the fix for this issue.
+// CHECK-NEXT: br label %[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY:.*]]
+// CHECK:[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY]]:
+// CHECK-NEXT: ret void
+
// Verify that we directly emit a call to the "target" region's body from the
// parent function of the the `omp.target` op.
+// CHECK: define internal void @omp_target_nowait_..omp_par
// CHECK: call void @__omp_offloading_[[DEV:.*]]_[[FIL:.*]]_omp_target_nowait__l[[LINE:.*]](ptr {{.*}})
+// CHECK-NEXT: br label %[[BLOCK_AFTER_TARGET_TASK_BODY:.*]]
+// CHECK: [[BLOCK_AFTER_TARGET_TASK_BODY]]:
// CHECK-NEXT: ret void
// CHECK: define internal void @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_nowait__l[[LINE]](ptr %[[ADDR_X:.*]])
// CHECK: store float 5{{.*}}, ptr %[[ADDR_X]], align 4
+
+// The following check test for the fix of problem #1 as described in https://github.com/llvm/llvm-project/issues/126949
+// CHECK: define internal void @.omp_target_task_proxy_func
+// CHECK: call void @omp_target_nowait_..omp_par
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Pranav. Just one suggestion to extend testing.
BasicBlock *TargetTaskContBlock = | ||
BasicBlock::Create(Builder.getContext(), "target.task.cont"); | ||
|
||
auto *CurFn = Builder.GetInsertBlock()->getParent(); | ||
emitBranch(TargetTaskContBlock); | ||
emitBlock(TargetTaskContBlock, CurFn, /*IsFinished=*/true); | ||
|
||
OI.ExitBB = TargetTaskContBlock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use splitBB
here.
BasicBlock *TargetTaskContBlock = | |
BasicBlock::Create(Builder.getContext(), "target.task.cont"); | |
auto *CurFn = Builder.GetInsertBlock()->getParent(); | |
emitBranch(TargetTaskContBlock); | |
emitBlock(TargetTaskContBlock, CurFn, /*IsFinished=*/true); | |
OI.ExitBB = TargetTaskContBlock; | |
OI.ExitBB = splitBB(Builder, /*CreateBranch=*/true, "target.task.cont"); | |
emitBlock(OI.ExitBB, OI.ExitBB->getParent(), /*IsFinished=*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emitBlock
isn't needed as splitBB
calls BasicBlock::create
with a Function *
. This results in the block being inserted into the function. Calling emitBlock
after splitBlock
triggers asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just minor nits from me. I also agree with @ergawy about using splitBB
to simplify the implementation.
This is a fix for #126949 There are two issues being fixed here. First, in some cases, OMPIRBuilder generates empty target task proxy functions. This happens when the target kernel doesn't use any stack-allocated data (either no data or only globals). The second problem is encountered when the target task i.e the code that makes the target call spans a single basic block. This usually happens when we do not generate a target or device kernel launch and instead fall back to the host. In such cases, we end up not outlining the target task entirely. This can cause us to call target kernel twice - once via the target task proxy function and a second time via the host fallback
This PR fixes both of these problems and updates some tests to catch these problems should this patch fail.